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Abstract — We consider the problem of recursively and causally 
reconstructing time sequences of sparse signals (with unknown 
and time-varying sparsity patterns) from a limited number of 
noisy linear measurements. The sparsity pattern is assumed to 
change slowly with time. The idea of our proposed solution, LS- 
CS-residual (LS-CS), is to replace compressed sensing (CS) on the 
observation by CS on the least squares (LS) residual computed 
using the previous estimate of the support. We bound CS-residual 
error and show that when the number of available measurements 
is small, the bound is much smaller than that on CS error if the 
sparsity pattern changes slowly enough. We also obtain conditions 
for "stability" of LS-CS over time for a signal model that allows 
support additions and removals, and that allows coefficients to 
gradually increase (decrease) until they reach a constant value 
(become zero). By "stability", we mean that the number of misses 
and extras in the support estimate remain bounded by time- 
invariant values (in turn implying a time-invariant bound on LS- 
CS error). The concept is meaningful only if the bounds are small 
compared to the support size. Numerical experiments backing 
our claims are shown (also include a dynamic MRI example). 



I. Introduction 

Consider the problem of recursively and causally recon- 
structing time sequences of spatially sparse signals (with 
unknown and time-varying sparsity patterns) from a limited 
number of linear incoherent measurements with additive noise. 
The signals are sparse in some transform domain referred to 
as the "sparsity basis" J5). An important example is dynamic 
magnetic resonance (MR) image reconstruction of deforming 
brain shapes in real-time applications such as MR image 
guided surgery or functional MRI [4|. Human organ images 
are piecewise smooth and so the wavelet transform is a valid 
sparsity basis 0. MRI captures a limited number of Fourier 
coefficients of the image, which are incoherent with respect 
to the wavelet transform J5), 0. Other examples include real- 
time estimation of time-varying spatial fields using sensor 
networks, real-time single-pixel video imaging (7), and video 
compression. Due to strong temporal dependency in the signal 
sequence, it is usually valid to assume that its sparsity pattern 
( support of the sparsity transform vector) changes slowly over 
time. This in verified in Fig. [Hand in fl8j, (9)- 

The solution to the static version of the above problem is 
provided by compressed sensing (CS) O, iflOl . CS for noisy 
observations, e.g. Dantzig selector J6), Basis Pursuit Denoising 
(BPDN) Oil, 02 or Lasso 03), 03), have been shown to 
have small error as long as incoherence assumptions hold. 
Most existing solutions for the dynamic problem, e.g. J7), OS), 
are non-causal and batch solutions. Batch solutions process 
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the entire time sequence in one go and thus have much higher 
reconstruction complexity. An alternative would be to apply 
CS at each time separately (simple CS), which is online and 
low-complexity, but since it does not use past observations, 
its reconstruction error is much larger when the number of 
available observations is small [see Table U or Figs. |4(a)] [5). 

The question is: for a time sequence of sparse signals, how 
can we obtain a recursive solution that improves the accuracy 
of simple CS by using past observations? By "recursive", we 
mean a solution that uses only the previous signal estimate 
and the current observation vector at the current time. The 
key idea of our proposed solution, LS-CS-residual (LS-CS), 
is to replace CS on the observation by CS on the least squares 
(LS) residual computed using the previous support estimate. 
Its complexity is equal to that of simple CS which is 0(Nm 3 ) 
where m is the signal length and N is the time duration lfl6l 
Table 1]. Compare this to 0(N 3 m 3 ) for a batch solution. 

Other somewhat related work includes lfT7l . |[T8l (use the 
previous estimate to speed up the current optimization, but not 
to improve reconstruction error), and (T9| (does not allow the 
support to change over time). Both ||T8l . Ifl9) appeared after 
JT). The work of l20l gives an approximate batch solution for 
dynamic MRI which is quite fast (but offline). Some other 
related work, but all for reconstructing a single sparse signal, 
includes \2\\ (uses a recursive algorithm) and ll22l (related 
model, but offline algorithm). Finally, none of these bound 
the reconstruction error or show stability over time. 

In this work, we do "CS", whether in simple CS or in CS- 
residual, using the Dantzig selector (DS) J6). This choice was 
motivated by the fact that its guarantees are stronger and its 
results are simpler to apply/modify (they depend only on signal 
support size) as compared to those for BPDN given in lfl2l 
(these depend on the actual support elements). In later work 
E), |9), for practical experiments with larger sized images, 
we have also used BPDN since it runs faster. Between DS 
and Lasso (|o constrained l\ minimization) lfl3l . fl4l . either 
can be used^J. If Lasso is used, by using the results from |14j 
as the starting point, results analogous to our Theorems Q] and 
|2] can be proved in exactly the same way and the implications 
will also remain the same. When using BPDN, obtaining a 
result similar to Theorem[T]is easy [9 ]. With a little more work 
it should be possible to also show stability as in Theorem [2] 

The LS-CS-residual (LS-CS) algorithm is developed in Sec. 
HT1 We bound its error and compare it with CS in Sec. Hn] 
Conditions for "stability" are obtained in Sec. [TV] Numerical 

'When we started this work, 1141 had not appeared and the best bounds 
for Lasso were from 1131 . These are proved under slightly stronger sufficient 
conditions than the DS results (6). Also, as argued in the discussion of j6], 
the DS bounds are smaller. Hence we chose to use DS. 
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(a) Top: larynx image sequence, Bottom: cardiac sequence 
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(b) Slow support change plots. Left: additions, Right: removals 
Fig. 1, In Fig. |l(a)[ we show two medical image sequences. 
In Fig. |l(b)| Nt refers to the 99% energy support of the two- 
level Daubechies-4 2D discrete wavelet transform (DWT) of these 
sequences. |iV t | varied between 4121-4183 (« 0.07m) for larynx 
and between 1108-1127 (~ 0.06m) for cardiac. We plot the number 
of additions (left) and the number of removals (right) as a fraction 
of \Nt\. Notice that all changes are less than 2% of the support size. 

experiments are given in Sec. [VI and conclusions in Sec. |VT] 
Sections marked with ** may be skipped. 

A. Notation 

The set operations U, fl, and \ have the usual meanings. T c 
denotes the complement of T w.r.t. [1, m] := [1, 2, . . . m], i.e. 
T c := [l,m] \ T. \T\ denotes the size (cardinality) of T. 

For a vector, v, and a set, T, vt denotes the \T\ length 
sub-vector containing the elements of v corresponding to the 
indices in the set T. \\v\\k denotes the £fe norm of a vector v. 
If just ||i>|| is used, it refers to ||i'|| 2 . Also, refers to the 
jth laj-gggt magnitude element of v (notation taken from Q). 
Thus, for an m length vector, \vm\ > |v( 2 )| • • • > |«( m )|. 

We use the notation v(S) to denote the sub-vector of v 
containing the S smallest magnitude elements of v. 

For a matrix M, ||M||fc denotes its induced fc-norm, while 
just ||M|| refers to ||M||2. M' denotes the transpose of M. 
For a tall matrix, M, M* := {M'M)~ l M'. 

For a fat matrix A, At denotes the sub-matrix obtained by 
extracting the columns of A corresponding to the indices in 
T. The S'-restricted isometry constant [3], 5$, for an n x m 
matrix (with n < rri), A, is the smallest real number satisfying 



(l-^)!|c|| 2 <p TC || 2 <(l + f J s )||c|| 



(1) 



for all subsets T C [l,m] of cardinality \T\ < S and all real 
vectors c of length \T\. The restricted orthogonality constant 
(0, 8s,s', is the smallest real number satisfying 



\ Ci 'A Ti 'At 2 C2\ < 6> s ,s'||ci||||c 2 | 



(2) 



for all disjoint sets T U T 2 C [l,m] with |T X | < S, |T 2 | < 5', 
S + 5' < m, and for all vectors ci, c 2 of length |Ti|, |T 2 |. 

B. Problem Definition and Some More Notation 

Let (zt) m xi denote the spatial signal at time t and (j/t)„ x i, 
with n < m, denote its noise-corrupted measurements' vector 



at t, i.e. y f = Hz t + Wt where w t is measurement noise. The 
signal, z t , is sparse in a given sparsity basis (e.g. wavelet) with 
orthonormal basis matrix, $ TOXm , i.e. Xt = $'zt is a sparse 
vector. We denote its support by N t . Thus the observation 
model is 

Vt = Ax t + Wt , A = H$, E[wt] = 0, E[w t w' t ] = a 2 f (3) 

We assume that A has unit norm columns. Our goal is to 
recursively estimate xt (or equivalently the signal, z t = Qxt) 
using yi, ... y t . By recursively, we mean, use only y t and the 
estimate from t — 1, it-i, to compute the estimate at t. 

We state our assumptions after the following definition. 

Definition 1 (Define S 1 *, S**): For A := 

1) let 5* denote the largest S for which 5s < 1/2, 

2) let 5** denote the largest S for which <5 2 s + @s,2S < 1. 
Assumption 1: In the entire paper, we assume the following. 

1) Sparsity and Slow Support Change. The support size, 
\N t \ « \N \ < m and the additions, \N t \ N t -i\ < 
S a < | TVo | and the removals, |iV t _i\JV t | < S a < |iV |. 

2) Incoherence. A satisfies S a < 5*** and |JVt| + k < S* 
for some k > (as we argue later fc <C |AT t | suffices). 

Sparsity and slow support change is verified in Fig.Q] Incoher- 
ence (approximate orthonormality of 5-column sub-matrices 
of A) is known to hold with high probability (w.h.p.) when A 
is a random Gaussian, Rademacher or partial Fourier matrix 
and n is large enough 0, 0. 

1 ) More Notation: We use it to denote the estimate of Xt 
given by our algorithm at time t and Nt to denote its support 
estimate. To keep notation simple, we avoid using the subscript 
t wherever possible. We will use the following sets often. 

Definition 2 (Define T, A, A e ): We use T := N t -i to 
denote the support estimate from the previous time. This serves 
as an initial estimate of the current support. We use A := N t \T 
to denote the unknown part of the support at the current time. 
We use A e :=T\ N t to denote the "erroneous" part of T. 

Definition 3 (Define f, A, A e ): We use f := N t to denote 
the final estimate of the current support. We use A := Nt \ T 
to denote the "misses" in the final estimate and A e := T\ N t 
to denote the "extras". 

Some more notation - xcsi-es, Adet, ?det> a, <*deU - is defined 
when we give our algorithm in Sec. HU 

II. Least Squares CS-residual (LS-CS) 
Given observation, y, the Dantzig selector |6) solves 

min HCIIi s.t. \\A'(y-AC)\\oo<X (4) 
C 

Now consider the recursive reconstruction problem. If the 
support of xt, N, were known at each t, we could simply 
compute its least squares (LS) estimate along Nt while setting 
all other values to zero. We refer to this estimate as the "genie- 
aided" LS estimate. When Nt is not known, one could do 
simple CS at each time, i.e. solve (0]i with y = y t , followed by 
thresholding the output to estimate its support, and then do the 
same thing using the support estimate Nt instead of Nt. But 
in doing so, we are throwing away the information contained 
in past observations. If the available number of measurements, 
ji, is small, this incurs large error [see Table HI. 
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To use the information contained in past observations, along 
with the knowledge that support changes slowly, we propose 
the following idea. Assume for a moment that the support has 
not changed from t — 1 to t. Use T :— Nt-i to compute an 
initial LS estimate and compute the LS residual, i.e. compute 

(x t ,Mt)T = A T ^y t , (x tMt ) T c = 

ytjes =Vt — Ax t ,Mt 

Notice that the LS residual, yt,te$, can be rewritten as 

2/t,res = A/3 t +W t , fit := X t - X t ,mit (5) 

where (3t is a \T U A|-sparse vector with (/3t)(TUA)° = 0, 

(A)t = [xt)T - A^yt = -A T ] {w t + A A (x t ) A ), 
(A)a = (x t )A - = (x t ) A (6) 

The above follows because At At = I and y t = Ax t +wt — 
A N (x t ) N + w t = A NnT {x t )NnT + A NnT -(x t ) N nT- + w t = 
At{xi)t + A A (xt) A + Wf Here N = N t . The last equality 
holds because A = N n T c and N n T C T. 

Notice that A C (iV t \ N t -i) U (AT t _i \ T) and A e C 
(JV t _i \ N t ) U (T \ JV t _i). From Sec. HI |JV t \ N t -i\ < S a 
and |iV t _i \ N t \ < S a . If |A| and |A e | are small enough (i.e. 
if S a is small enough and T is an accurate enough estimate of 
Nf-i) and A is incoherent enough to ensure that 
is small enough, /3 t will be small (compressible) along T. In 
other words, (3 t will be only | A|-approximately-sparse. In this 
case, doing CS on y t les should incur much less error than 
doing CS on y t (simple CS), which needs to reconstruct a 
|iVf|-sparse signal, Xt- This is the key idea of our approach. 

We do CS on the LS residual (CS-residual), i.e. we solve 
(j4]i with y = yt,Ks and denote its output by f3 t . Now, 

X t ,CS KS ■= $t + X t ,imt ( 7 ) 

can serve as one possible estimate of xt- But, as explained in 
J6), since fit is obtained after l\ norm minimization, it will 
be biased towards zero. Thus, i^csres will also be biased. We 
can use the Gauss-Dantzig selector trick of [6| to reduce the 
bias. To do that, we first detect the new additions as follows: 

iV Met = TD{i G [l,m] : |(x t ,cs res )i| > a} (8) 

and then we use Tdet '■= ^t.det to compute an LS estimate 

(x t At)f dcl = Afjyt, (£Met) ( f dc ,)- = ( 9 ) 

If 7/det = Nt, i*,det will be unbiased. In fact, it will be the best 
linear unbiased estimate, in terms of minimizing the mean 
squared error (MSE). But even if T^ et is roughly accurate, the 
bias and MSE will be significantly reduced. 

If the addition threshold, a, is not large enough, occasion- 
ally there will be some false detections (coefficients whose 
true value is zero but they wrongly get detected due to error 
in the CS-residual step). Also, there may have been actual 
removals from the true support. This necessitates a "deletion" 
step to delete these elements from the support estimate. If this 
is not done, the estimated support size could keep increasing 
over time, causing At' At to become more and more ill- 
conditioned and the initial LS estimates to become more and 
more inaccurate. An example of this is shown in Fig. |4(c)| 



A simple approach to do deletion is to apply thresholding 
to Xt,det, i.e. to compute 

N t = Tdet Tdet : |(£*,det)i| < a de i } (10) 

The above is better than deleting using a^csres which, as 
explained above, usually has a larger MSE. 

A final LS estimate can be computed using T := N t . 

[xtjf = Aj.'y t , (x t )fc=0 (11) 

In most places in the paper, we use "addition " ( "removal ") 
to refer to additions (removals) from the actual support, while 
using "detection " ( "deletion ") to refer to additions (removals) 
from the support estimate. Occasionally, this is not followed. 

A. LS CS-residual (LS-CS) Algorithm and More Notation 

We give the LS CS-residual (LS-CS) algorithm below. 
Initialization (t = 0): At the initial time, t = 0, we run simple 
CS with a large enough number of measurements, no > n 
(usually much larger), i.e. we solve with y = yo and A = 
Aq = (Hq, and hence Aq, will be an no x m matrix). This 
is followed by support estimation and then LS estimation as 
in the Gauss-Dantzig selector [6 J. We denote the final output 
by xo and its estimated support by Nq. For t > do, 

1) Initial LS. Use T := Nt-i to compute the initial LS 
estimate, x^init, and the LS residual, yt,res, using (|5j. 

2) CS-residual. Do CS (Dantzig selector) on the LS 
residual, i.e. solve with y = y f res and denote its 
output by j3f Compute i tj csres using 0. 

3) Detection and LS. Use (© to detect additions to the 
support to get Td et := -/V f .d e t- Compute the LS estimate, 
£ t ,det, using Tdet, as given in ©. 

4) Deletion and LS. Use ( fTol > to detect deletions from the 
support to get T := N t . Compute the LS estimate, it, 
using T, as given in ( fTTT i. 

5) Output it and z t = Feedback N t . 
Increment t and go to step [TJ 

We define the following which will be used in Sec. IIVI 
Definition 4 (Define Td e t, ^-det, A e ^ cf/ ): We define Td et := 
iVt,det, ^det := N t \ T det and A e , det := T det \ N t . 

B. Threshold Setting Heuristics 

If n is large enough (to ensure that A jv t is well-conditioned), 
a thumb rule from literature is to set a at roughly the noise 
level Q. If the SNR is high enough, we recommend setting 
adei to a larger value (since in that case, Xdet will have much 
smaller error than icsies), while in other cases, add — ol is 
better. Higher oidel ensures quicker deletion of extras. 

When n is not large enough, instead of setting an explicit 
threshold a, one should keep adding the largest magnitude 
elements of idet until Af ^ exceeds a condition number 
threshold, adei can be set to a fraction of the minimum nonzero 
coefficient value (if known). Also, add should again be larger 
than the implicit a being used. 

Another heuristic, which ensures robustness to occasional 
large noise, is to limit the maximum number of detections at 
a given time to a little more than S a (if S a is known). 
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C. Kahnan filtered CS-residual (KF-CS) 

Now, LS-CS does not use the values of (xt-i)T to improve 
the current estimate. But, often, in practice, coefficient values 
also change slowly. To use this information, we can replace 
the initial LS estimate by a regularized LS estimate. If training 
data is available to learn a linear prior model for signal 
coefficients' change, this can be done by using a Kalman filter 
(KF). We develop and study the KF-CS algorithm in 0], ED- 
As we demonstrate in l23l . KF-CS significantly improves upon 
LS-CS when n is small and so At can occasionally become 
ill-conditioned (this results in LS-CS instability, but does not 
affect KF-CS much, as long as this occurs only occasionally). 

III. Bounding CS-residual Error 

We first bound the CS-residual reconstruction error and 
compare it with the bound on CS error. In Sec. IIII-CI we 
give a tighter bound on the CS-residual error, but which holds 
under stronger assumptions. All bounds depend on \T\, |A|. 

To simplify notation, in this section, we remove the subscript 
t. Consider reconstructing x with support, N, from y := Ax + 
w. The support can be written as N = (TU A) \ A e where T 
is the "known" part of the support (equal to support estimate 
from the previous time), A e := T \ N and A := N \ T. 

A. Bounding CS-residual Error 

If n is large enough so that \T\ + |A| = \N\ + \A e \ < 
S 1 **, then we can use the bounds given in Theorems 1.1 or 
1.2 of 121 to bound the CS-residual error. But recall that CS- 
residual is primarily designed for situations where n is smaller. 
It applies CS to the observation residual y = A(3 + w where 
(3 := x— ii n it is a (|T| + | A|)-sparse signal, that is compressible 
(small) along T. To bound its error, we first prove Lemma [T] 
which modifies Theorem 1.3 of J6| to apply it to "sparse- 
compressible signals", i.e. sparse signals that are partly (or 
fully) compressible. Next, we bound \\Pt\\ (the "compressible" 
part of B). Finally we use this lemma along with the bound 
on ||/?t|| to obtain the CS-residual error bound. 

Lemma 1 (CS error bound - sparse-compressible signal): 
Assume that ||w||oo < JMl (bounded noise). Let ( is 
an SV^-sparse vector with support T nz , and we measure 
y := A( + w. Its estimate, obtained by solving (0]), obeys 
the following. For all sets T rest C T nz of size |T res t| = S nz — S 
and for all 1 < S < min(S'**, S nz ), 



IIC-CII 2 <C 2 (S)SX 2 + C 3 (S) 



IKOrJIj 
S 



< C 2 {S)SX 2 + C 3 (S) 



^ {Snz 5) II(0tJ| 2 , 



5' 



where C 2 (5) 



48 



(1 - S 2S - S ,2S) 2 ' 



and C 3 {S) = 



'S,2S 



(1 - 5 2 s - 



(12) 



'S,2S) 



The proof is given in Appendix lAl Recall that 5** is defined 
in Definition Q] 



Recall that B 



can be rewritten as 



B T = -{At A T ) 1 A T '(A A x A + w) 
f3 A =x A , (/3)(tuA)<» = (13) 

As long as 5\t\ < 1, ||/3r|| 2 can be bounded as follows. 

\\Pt\\ 2 < 2[\\(A T 'A T r 1 A T 'A A x A \\ 2 + WiAT'AT^AT'wf 

2 n 



< 2 



(1 



6\t\) 2 



x A 



(l-5 m ) 



(14) 



The above follows by using (a) ||(At'^4t) 1 At 



'112 



(if 8\ T \ < 1) and (b) 



KAt'At)- 1 ]] < 1/(1 - S m ) 

\\At'A a \\ < 0\t\,\a\- ( a ) follows because ||(At'At) 1 || is 
the inverse of the minimum eigenvalue of At' At- (b) follows 
from © by setting ci = A Tl 'A T2 c 2 , T x = T, and T 2 = A. 

From (O, it is clear that if the noise is small and if |A|, 
|A e | are small enough so that 9 is small, \\Bt\\ is small. Using 
( fl~4b along with Lemma [T] we can prove the following. 

Theorem 1 (CS-residual error bound): Assume that \T\ := 
\N\ + |A e | - |A| < S. and \\wWc < Then, 



IF - a;csn 

FcSTes(S) 

B(S)-- 



< min -Fcsres(<S'), where 

l<S<min(S,»,|T| + |A|) 



[C 2 (S)SX 2 + C 3 (S) 



ITI + IAI 



s 



S 



B(S)] 



? 2 ||x A || 2 + 4|H| 2 if5>|A| 
l 2 \\x A \\ 2 +M\w\\ 2 + ||x A (|A| - 5)|| 2 if S < |A| 

(15) 



where 9 :~ ^|t|.|A| an d C 2 (S), C 3 (S) are defined in (1121) . 

The proof is given in Appendix [A] Recall: x A (K) is the 
vector containing the K smallest magnitude elements of x A . 

A simple corollary of the above result follows by applying 
it for a particular value of S, S = |A| when |A| > 0. This 
will usually result in the smallest bound in situations where 
SU is not much larger than |A|. When |A| = 0, f3 T = A T ^w 
which is anyway quite small. It is not immediately clear which 
value of 5* is the best. We retain the min in this case. We also 
bound ||u>|| by its maximum value, -y/nA/ll-AHi. 

Corollary 1 (simpler CS-residual error bound): Assume 
that |H|oo < A/PHi, |A| < SU and |T| < S*. 
1) If |A| > 0, 



ll^-XcSresH 2 <C 

C' = C'(\T\,\A\):= 



C"0| TMA | 2 ||.T A || 2 , 



where 



C7 2 (|A|)|A|A 2 +4c7 3 (|A|) 



\T\ n\ 2 



C" = C"(|T|, |A|) := 8C 3 (|A|)|r| 
2) If |A| = 0, \\x ~ xcsresll 2 < B where 



B := min \C 2 (S)SX 2 

1<S<S„ 



C 3 (S) 



\T\-SAnX 2 , 



S 



\a\w 



\A\\i 

(16) 



(17) 



where C 2 (S), C 3 {S) are defined in (O. 

This corollary is used in the LS-CS stability result. 

Remark 1: It is easy to modify the above results for Gaus- 
sian noise, w ~ A/"(0, cr 2 /), in a fashion analogous to the 
results of (6). Like (6J, we will also get "large probability" 
results. We do not do this here because any large probability 
result will make the study of stability over time difficult. 
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Remark 2: In the bounds of Theorem[T]or Corollary Q] there 
is a term that is proportional to (|T| + |A| - S) or to |T| 
respectively. This comes from Lemma [T] when we bound the 
h norm term, ||(C)tJ| 2 , by (S nz - 5) times ||(C)tJ| 2 . A 
similar term is also there in the CS bound given below in ( fT8l ). 
This is the bound for CS error which holds under the same 
weak assumptions as those used by our resuljl 

B. Comparing CS-residual and CS error bounds 

We now compare the CS-residual bound with that of CS. 

Remark 3: By showing that the upper bound on CS-residual 
error is much smaller, we only show that the performance 
guarantees for CS-residual are better than those for CS. To 
actually compare their errors, we use simulations. 

To compare the CS-residual bound with CS, first note that 
the CS error bounds for sparse signals given in Theorems 1 . 1 
and 1.2 of apply only when 62s + 9s.2S < 1 holds for 
S = \N\, i.e. when \N\ < SU- When n is small and this 
does not hold, these results are not applicable. On the other 
hand, Lemma[T]does not assume anything about 5**. Let xcs 
denote the simple CS output. Using Lemma Q] it is easy to 
see that 

\\x -x cs f < min F CS (S), where 

l<S<min(S»«,|iV|) 



F CS (S) := [C 2 (S)SX 2 + C 3 (S) 
Bcs(S) := \\x N (\N\~S)\\ 2 



\N\ 



-Bcs(S)} 



(18) 



Compare (TT3T > with ( TT8l under the following assumptions. 

1) The magnitude of the largest element of a; a is smaller 
than or equal to that of the smallest element of Xn\a> 
i.e. |(xa)(i)| < |(£at\a)(at\A)|- This is reasonable since 
A contains the recently added elements which will 
typically be smaller while N \ A contains the previously 
added elements which should have a larger value. 

2) |A|, |A e | are small enough (i.e. S a is small enough and 
T w Nt-i) and the noise is small enough so that 

a) |A| < 0.1 |7V| and |A e | < 0.1|AT| (this ensures that 
|T| < 1.1|JV|). 

b) Vl.lA, 2 < 1/8, and ||«,||» < ^f^ . 

3) n is small so that S*** = 0.2|iV|, but is just large 
enough so that S* > l.l\N\. S* > 1.1\N\ along with 
assumption [2al ensures that S\t\ < 1/2. 

The above assumptions ensure that ||/3ri| 2 < 80 2 ||xa|| 2 + 
4|M| 2 < 80 2 ||.t jvxa (|A|)|| 2 +4|| w || 2 < ^aCADI! 2 , i.e. 
Pt is "compressible enough". 

Under the above assumptions, we show that i ? csres(<S') in 
(JT3J is significantly smaller than Fcs(S) in (fT8l for each value 
of S and hence the same will hold for the upper boundfl 

2 A term containing the l\ norm of the "compressible" part appears in all 
bounds for CS for compressible signals, e.g. (6] Thm 1.3], and hence also 
appears when we try to bound CS error for sparse signals with not-enough 
measurements. 

3 Notice that ^4§l < a for all S implies that mi "s ffatoOg) < a for 
all S. Since this holds for all S, it also holds for the max taken over S, i.e. 

min s frSri-s(S) _ min s -FcSres(S) 



For any S, the first term in Fcs res (S) and Fcs(S) is the 
same. In the second term, the main difference is in B(S) 
versus Bcs(S). The constants are almost the same, their 
1 + < 9/8 (follows since 



ratio is 



V| + |A e |-S 



\N\-S * 1 |JV|-S 

S < 5„ = 0.2|JV| and |A e | < 0.1|7V|). Thus if we can show 
that B(S) is much smaller than Bcs(S), we will be done. 
First consider |A|<5<0.2|iV|. In this case, 



B(S) 



9 2 ||x A || 2 +4|| U ;|| 2 



< ||^\ A (|A|)|| 2 < ||^A(0.1|iV|)|| 2 
Bcs(S) > B CS (0.2\N\) = ||a?j>r(0.8|JV|)|| a 

= bA|| 2 + ||^ A (0.8|iV|-|A|)|| 2 
>0+||x Ar \ A (0.7|7V|)|| 2 
>A\x N \^.\\N\)f >1B{S) (19) 

Now consider 1 < S < |A| 

B(S) = 89 2 \\x A \\ 2 + 4||u.|| 2 + ||.t a (|A| - 5)|| 2 



< II^\a(|A| 



\Xa\ 



<2||x 7VVA (|A|)|| 2 <2||a; ArX A(0.1|iV|)|| 2 
Bcs(S) > Bos(|A|) > B CS (QA\N\) - ||^(0.9|^|)|| 2 
> ll^\A(0.8|iV|)|| 2 

>8\\x N \ A (0.1\N\)\\ 2 >4B(S) (20) 



Thus 



, < 1/4 in all cases. Denote the common first 
term in Fcs re s and Fes by Tl. Denote the second terms 
by T2 CSres and T2 CS . Thus, ^ = m $tf S < 

(l/4)(9/8) = 9/32. Thus, Fcs^S) < F CS (S) for all 
S < 5**. By footnote [3] the same holds for the bounds. 

Furthermore, if the noise is small enough, and for S < S**, 
the second term is the dominant term in Fcs(S), i.e. Tl <C 

TT_„ TKon -PcsrcstS) __ Tl i Tgcsreg ^ T2 CSrL . s ^ n/09 
iz.es- inen Fcs(s) — T1+T2 CS T1+T2 CS ~ T2 C s - y / oz ' 

i.e. Fcs KS (S)/F cs (S) is also roughly less than 9/32 for all S. 
From footnote|3] this means that the CS-residual bound is also 
roughly (9/32) times the CS bound (is significantly smaller). 

If |A| =0, assumption |2b1 does not hold and so the above 
comparison does not hold. But clearly, under high enough 
SNR, B in Corollary Q] is much smaller than ( fl~8l . 

1) Monte Carlo Comparison: We compared CS-residual 
error with CS error using simulations for a single time instant 
problem with m = 200, \N\ = 20, |A| = |A e | = 0.1|iV| = 2 
and for n = 45, 59, 100. We compare the normalized MSE's 
in Table U The CS-residual error is much smaller than the CS 
error except when n = 100 (large) in which case the errors 
are roughly equal. See Sec. IV-AI for details. 



C. Tighter CS-residual Bound under Stronger Assumptions** 

To address an anonymous reviewer's comment, we give 
below a tighter error bound for CS-residual (does not contain a 
term proportional to |T|). But this also holds under a stronger 
assumption. This section can be skipped in a quick reading. 



Using O, IK/?) 



T||l 



< \\A T *A 



A 1 ZA 1 



min s F CS (S) 



maxg 



Fcs(S) 



< a. 



Notice that if the noise is small and |A| is small, ||(/3)t||i will 
be small. In particular, if ||(/?)t||i < ^II^aHi, by applying the 
first inequality of Lemma Q] with ( — /3, S nz = \T\ + |A|, 
S = |A| and T rest = T; using |jx A ||i < VTaIH^aII; and 
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combining the resulting bound with that given in Corollary Q] 
we get the following. 

Corollary 2: Assume that 



1) IMloo < 



|A| < S. 



\T\ < S*, 

2) IIAtUaH! <cand II.TAH! > I|At+u 
then, 



6-||-4 T tA A ||i 
x-icsresll 2 < C 2 (|A|)|A|A 2 + C 3 (|A|) X 



for a b > 



i(6 2 ||x A || 2 , 8|T|0 2 ||a: A | 



■4ITI 



(21) 



If |A| = 0, condition [2] cannot hold. In this case, ||ie — 
icsres|| 2 < Bq with Bq defined in Corollary Q] ■ 

Notice that the first term in the min does not contain \T\. 
The above bound is tighter when this first term is smaller, i.e. 
b is small enough (happens if |A| small but ||xa||oo large). 

IV. LS-CS Stability 

So far we bounded CS-residual error as a function of 
\T\, |A|. The bound is small as long as |A e | and |A| are small. 
A similar bound on LS-CS error as a function of |T|, |A| is 
easy to obtain. The next questions are: 

1) Under what conditions on the measurement model and 
the signal model, will the number of extras, |A e |, and 
the number of misses, |A|, and hence also |A e |, |A|, be 
bounded by a time-invariant value, i.e. be "stable"? This 
will imply a time-invariant bound on LS-CS error. 

2) If additions/removals occur every-so-often, under what 
conditions can we claim that |A e |, |A| will become zero 
within a finite delay of an addition time? This will mean 
that the LS-CS estimate becomes equal to the genie- 
aided LS estimate (LS estimate computed using N t ). 

The answers to both these questions are interrelated and are 
given in a single theorem. Of course as mentioned earlier, 
"stability" is meaningful only if the bounds on the misses and 
extras are small compared to the support size. 

A. Signal Model 

For studying stability, we need to assume a signal model. 
We assume the following deterministic model that (a) assumes 
a nonzero delay between new coefficient addition times, (b) 
allows a new coefficient magnitude to gradually increase from 
zero for sometime and finally reach a constant value and (c) 
allows coefficients to gradually decrease and become zero (get 
removed from support). At t = 0, we assume that .To is (So — 
S a ) sparse with all "large" coefficients with values ±M. 

Signal Model 1: The model is as follows. 

1) Initialization. At t = 0, xo is (So — S a ) sparse. All its 
nonzero coefficients have values ±M. 

2) Addition. At t = tj = 1 + (j - l)d, for all j > 1, 
S a new coefficients get added. Denote the set of indices 
of coefficients added at t = tj by A = A(j). A new 
coefficient, i € A, gets added at an initial magnitude 
(its sign can be ±1) and then its magnitude increases at 
a rate until it either reaches M or for d time units. 
Thus, the maximum magnitude of the i th coefficient is 
min(M, da,i) for i ^ No, and is M for i e No- 



3) Removal. S a coefficients get removed at t = tj +1 — 1 = 
jd for all j > 1. Denote the set of indices of coefficients 
which get removed at — 1 by 7Z = TZ(j). During 
[tj+i — r, tj + i — 1], the elements of 1Z start to decrease 
and become zero at t = tj+i — 1. For coefficient, i, the 
rate of decrease is min(M, dai)/r per unit time. 

4) The sets A(j) and IZ(j) are disjoint, i.e. the coefficients 
that just got added do not get removed. 

Thus at any t € [tj,tj + i — r — 1], the support can be split as 
A (increasing coefficients) and N\A (constant coefficients), 
where N = N t = N t] . At any t e [t j+1 - r,t j+ i - 2], it 
can be split as A (increasing), 1Z (decreasing), N \ (A U 1Z) 
(constant). At t = tj+i - 1, N = N t = N tj \ H (all constant). 

Notice that in the above model the signal support size 
remains roughly constant. It is So or (So — S a ) at all times. 
Also, the maximum signal power is bounded by SoM 2 . 

B. Three Key Lemmas 

Proving stability, i.e. showing that the number of misses, 
|A|, and extras, |A e |, remain bounded, requires finding suffi- 
cient conditions for the following three things to hold at the 
current time: (a) one, or a certain number of, large undetected 
coefficients definitely get detected; (b) large enough detected 
coefficients definitely do not get falsely deleted, and (c) every- 
so-often the extras (false detects or true removals) definitely 
do get deleted, (a) and (b) are used to ensure that |A| remains 
bounded while (c) is used to ensure that |A e |, and hence 
1^1 < \N\ + |A e |, remains bounded. These three things are 
done in the following three lemmas. 

Lemma 2 (Detection condition): Assume that \T\ < St, 
|A| < Sa, and IHU < Vll^lli- The current largest 
magnitude undetected element, (xa)(i), will definitely get 
detected at the current time if St < S*, Sa < 5***, 



2^s T ,s A SaC" (St, Sa) < L 
2a 2 + 2C" (St, |A| 



and 



max 

|A|<S, 



1 



on 2 

P St,|A| 



A|C"(S Tj |A 



TT < (£A)(1) 



(22) 



Lemma 3 (No false deletion condition): Assume that 
IMloo < A/||A||i, |f d et| < S T and |A de t| < S A . For a given 
bi, let Ti := {i e f det : xf > &i}. All i e T ; will not get 
(falsely) deleted at the current time if St < S*, and 

'' 2 (23) 



K > 2c4 



l^ll? 



l69 ST ^\A det \\\ 



Lemma 4 (Deletion condition): Assume that H^Hoo < 
A/mili, |f det | < S T and |A det | < S A - All elements of 
A e , det will get deleted if S T < S* and a 2 del > p^- + 

These lemmas follow easily from Corollary Q] and a few 
simple facts. They are proved in Appendix |B1 

C. The Main Result 

We analyze the LS-CS algorithm given in Sec. III-AI By 
running simple CS at t = with an appropriate number of 
measurements, no > n (usually much larger), we assume 
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DET A (i) 
No-FD A(i),...A m ,(N\A) 



DEL A c 
No-FD A,%,N\ (AUK) 



No-FD (N\A) ; 



DEL A e 
No-FD A, (N \ A) 



tj t i+ d 

(S- new adds) 



t h rr 



(S a removals) 



LS-CS = Genie-LS 



Fig. 2. Our approach to show stability (prove Theorem [2}- We split 
[tj,tj+i — 1] into the four sub-intervals shown above and ensure no 
false deletion (No-FD) / detection (DET) / deletion (DEL) of the 
coefficients listed in each. The notation A(i) refers to the i th largest 
increasing coefficient. Recall: in tj+i — r—l],Ais the increasing 
coefficients' set and N \ A is the constant coefficients' set. 

that we detect all nonzero coefficients and there are no false 
detects, i.e. No = Nq. This assumption is made for simplicity. 

For stability, we need to ensure that within a finite delay of 
a new addition time, all new additions definitely get detected 
(call this delay the "detection delay"). This needs to be done 
while ensuring that there are no false deletions of either 
the constant or the definitely detected increasing coefficients. 
Further, (a) by letting the delay between two addition times be 
larger than the "detection delay" plus the coefficient decrease 
time, r, and (b) by setting the deletion threshold high enough 
to definitely delete the extras in the duration after all detections 
are done, we can show stability. 

To obtain our result, the above is done by splitting [tj, tj+i — 
1] into the four subintervals shown in Fig. [2] and using 
the lemmas from the previous subsection to find sufficient 
conditions so that the following hold for some do < d: 

1) At all t G [tj, tj + do — 1], there is no false deletion of 
the constant coefficients (during this time the increasing 
coefficients may be too small and we do not care if they 
get detected or not). This ensures that the number of 
misses do not increase beyond S a . 

2) At t = tj + do+i — 1, for i = 1, . . . S a , (a) the i th largest 
increasing coefficient definitely gets detected, and (b) 
all constant coefficients and the first i largest increasing 
coefficients do not get falsely deleted. This ensures that 
by t = tj + do + S a — 1, the number of misses becomes 
zero, i.e. the "detection delay" is do + S a — 1. 

3) At t = tj + do + S a — 1, all false detects get deleted. 
This is needed to keep |T| bounded. 

4) At all t e [tj+do+Sa, tj+i— r— 1], (a) the current falsely 
detected set is immediately deleted and (b) none of the 
constant or increasing coefficients get falsely deleted. 

5) At all t e [tj+\ — r, t J+ i - 2], (a) the current falsely 
detected set is deleted and (b) none of the decreasing, 
constant or increasing coefficients are falsely deleted. 

6) At tj+i — 1, all falsely detected and removed coefficients 
are deleted and there is no false deletion. 

Doing the above leads to the following result. 
Theorem 2 (LS-CS Stability): Under Signal Model Q] if 
there exists a do < d, so that the following conditions hold: 

1) (initialization) all elements of xo get correctly detected 
and there are no false additions, i.e. No = No, 

2) (algorithm - thresholds) we set a ( iei = 2\A^VII^-I|i and 



we set a large enough so that there are at most / false 
detections per unit time, 

3) (measurement model) 

a) IMU < A/PUi, S a < S„, So + f(do + S a ) < 
5*, and 

b) 26 St .sJS a C"(St,S a ) < 1 with S T = S + 
f(d + S a ) and 5 A = S a 

4) (signal model - additions & no false deletions of increas- 
ing coefficients) the following hold for all i = 1, . . . S a 
and for all A = A(j) for all j: 

a) with S T = S + f(d + i-l) and S A = S a -i + l, 

imn{M,(d + i)(a A ) (i) ) 2 > 

2a 2 + 2C , (S T ,\A\) 



max 



|A|<s A l-2(^ T ,| A |)2|A|C"(5 T ,|A|) 

b) with S T = S + f(d + i), S A = S a - i and 

(a^)( Sa +i) = 0, 

min(M, (do + i)(oa) w ) 2 > 2a 2 del + (8nA 2 /\\A\\\) 

a 

i) min(M, (d + i)(a^)( 4+1 )) 

5) (signal model - no false deletions of constant coeffi- 
cients) with St = So + f(do + S a ), S A = S a , 

mm(M,dmmai) 2 > 2a 2 del + (8?i.A 2 /P|| 2 ) 

i 

+ 168s T ,S & 2 S a min(M, (d + S a ) maxaj) 2 

i 

6) (signal model - no false deletions of decreasing coeff's) 

min(M, dmin ai) 2 > r 2 (2a 2 del + (4?iA 2 /|j A\\D) 

i 

7) (signal model - delay b/w addition times large enough) 

d> d Q + S a + r 

where C'{., .), C"(., .) are defined in (O, 
then, 

1) at all t, \A\ < S a , |A e | < f(S a + do) and \f\ < 
So + f(S a + do) and the same bounds also hold for 
|A|, |A e |, |T| respectively; and 

2) for all t e [tj+do+S a -l,t j+1 -l], |A| = = |A e |, and 
thus N t = N t (LS-CS estimate = genie-LS estimate). 

The proof is given in Appendix ICl Note that in min^ a^, the 
min is taken over i £ [1, m] and same for max, a^. We now 
give a simple corollary of Theorem|2] (proved in Appendix |Db. 

Corollary 3: If the conditions given in Theorem |2] hold, 

1) at all t, the LS-CS error satisfies 

|| (art - £*)aI| 2 - min(M, (d + S a ) maxa*) 2 

i 

\\(x t - x t ) f \\ 2 < 89 2 S a min(M, (do + S a ) max a,) 2 + 
(4nA 2 /||A|| 2 ) 

with 6 computed at S T = S + f(d + S a ), S A = S a 

2) at all t, the CS-residual error, \\xt— xt csres|| 2 , is bounded 
by 

max(B ,C + 9 2 C"S a min(A7, (do + S a ) max a;) ) 
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with 9,C',C" computed at S T = So + /(do + S a ), 
Sa = S a , and Bo defined in (TlTb . 

Remark 4: Note that the initialization assumption is not 
restrictive. Denote the bound given by Theorem 1.1 of |6| 
for S = So — S a by Bl. It is easy to see that this assumption 
will hold if the addition threshold at t — is Oinit = V Bl 
(ensures no false detects) and if M > Oi n n + V Bl = 2V Bl 
(ensures all true adds detected). If the noise is small enough, 
by choosing no large enough, we can make Bl small enough. 
Even if this cannot be done, our result will only change 
slightly. The misses can be combined with the new additions 
at t = 1. Extras will at most increase the bound on |T| by /. 

Remark 5: By using Corollary |2j instead of Corollary Q] as 
the starting point for proving the above result, it should be 
possible to weaken conditions [3b] and [4a] We have not done 
this here, in order to convey the basic idea in a simpler fashion. 

D. Discussion and Extensions 

Notice that Signal Model Q] results in bounded SNR and 
roughly constant signal support size at all times. Theorem [2] 
and Corollary [3] show that under Signal Model Q] and under 
the initialization assumption (made only for simplicity), if 

1) the noise is bounded and n is large enough so that 
condition [3] holds, 

2) the addition/deletion thresholds are appropriately set 
(condition [2]), 

3) for a given noise bound and n, the smallest rate of coef- 
ficient magnitude increase is large enough (condition [4]i 
and the smallest constant coefficient magnitude is also 
large enough (conditions [5] [6]), 

4) and the delay between addition times is larger than 
the "detection delay" (which in turn depends on the 
magnitude increase rate), i.e. condition [7] holds, 

then, 

1) the number of misses, |A| < S a , and the number of 
extras, |A e | < f(S a + do) and the same bounds hold for 
|A|, |A e | (here do < d is the smallest integer for which 
the conditions of Theorem [2] hold), i.e. "stability" holds; 

2) within a finite "detection delay", do + S a — 1, all new 
additions get detected and not falsely deleted (|A| = 0), 
and the extras get deleted (|A e | = 0), i.e. the LS-CS 
estimate becomes equal to the genie-LS estimate; 

3) and the LS-CS error and the CS-residual error are 
bounded by time-invariant values. 

From Assumption Q] (given in Sec. II-Bb . S a <C So- When 
n is large enough (as required above), it is easy to set a so 
that / is small, e.g. in our simulations the average / was often 
less than 1 while So = 20. With a fast enough signal increase 
(as required above), do will also be small. Thus we can claim 
that |A| and |A e | will be bounded by a small value compared 
to the signal support size, So, i.e. "stability" is meaningful. 

Under the above assumptions, compare our requirements 
on n (condition [3J to those of the CS error bound (6|, which 
needs So < >S**. The comparison is easier to make if we 
slightly modify the definition of 5** to be the largest S for 
which 62s < 1/2 and 62s + @s,2S < 1 (this will imply that 
25** < S**). Clearly S a < 5*** is much weaker than So < 5**. 



Also, S + f(d + S a ) < S* is weaker than 2S < S*. Finally, 
if /, do , S a are small enough, condition [3b] is also weaker. 

Notice that our signal model assumes that support changes 
occur every d time instants. This may be slightly restrictive. 
But it is necessary in order to answer our second question (do 
the support errors ever become zero). If we do not care about 
answering this, we can assume a signal model with d = 1 and 
modify our arguments to still ensure stability. But the support 
errors may never become zero. We do this in l24l . 

Also, note that if r is large (slow rate of decrease), condition 
[6] becomes difficult to satisfy. If we remove this, we may not 
be able to prevent false deletion of the decreasing coefficients 
when they become too small (go below ctdel + pjf^)- But 
since they are small, this will increase the CS-residual error 
at the next time instant only slightly. With small changes to 
our arguments, it should be possible to still prove stability. 

V. Numerical Experiments 

In Sec. IV-AI we study a static problem and compare CS- 
residual error with that of CS. In Sec. IV-BI we verify the 
stability result of Theorem [2] In Sec. IV-CI we simulate lower 
SNRs and faster additions. In all these simulations, A was 
random-Gaussian. We averaged over 100 simulations (noise 
and signal supports for all times randomly generated) for all 
the time-series simulations and over 50 for the static one. In 
Sec. IV-DI we show a dynamic MRI reconstruction example. 
All our code used CVX, www.stanford.edu/~boyd/cvx/ 

A. Comparing CS-residual with CS 

We simulated a single time instant reconstruction problem 
(reconstruct x from y := Ax + w) with m = 200, \N\ = 20, 
and with |A| = 0. 1 1 TV | = 2 = |A e |. The noise if was zero 
mean i.i.d Gaussian. The nonzero signal values, xn, were i.i.d. 
±1 with equal probability. The sets iV, ACJV and A e C N c 
were uniformly randomly generated each time. We used four 
different noise standard deviations (a = 0.0439 * [1, 2, 4, 10]) 
and three different choices of n (45, 59, 100). In Table [J we 
compare the normalized MSE (NMSE) of CS-residual output 
with that of CS. CS (Dantzig selector) was run with different 
choices of A while for CS-residual we fixed A = 4er. Except 
when n = 100, in all other cases CS-residual outperforms CS 
significantly. For n = 100 (large n), if a = 0.04, CS (with 
smallest A) is better, and if a = 0.09, both are similar. 

A few other observations. (1) When n is small, the best CS 
error occurs when we run it with the smallest A. Smaller A 
reduces the size of the feasible set and thus the t\ norm of the 
minimizer, x, is larger, i.e. more of its elements are nonzero (if 
A is too large, x = will be feasible and will be the solution). 
(2) We also compared the CS-residual error with the error of 
the final LS-CS output (not shown). Only when CS-residual 
error was small, the support estimation was accurate and in 
this situation the final LS-CS error was much smaller. 

B. Verifying LS-CS stability 

In Fig. [3] we verify the stability result. We simulated Signal 
Model Q] with m = 200, S = 20, S a = 2 and with d = 



TABLE I 

Comparing normalized MSE of CS-residual(with A = 4<r) with that of CS (Dantzig Selector (DS)) with three different A's. We 
used m = 200, \N\ = 20, A = |A e | = 2. Comparison shown for three choices of n = 45, 59, 100 in the three tables below. 



(n = 45) 


n=45 


n=45 
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n=45 




(n = 59) 
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ct=0.04 
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Fig. 3. Verifying the stability result. 
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(a) Low SNR, Slow adds, n=59 (b) Low SNR, Slow adds, n=59 
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(c) Low SNR, Fast adds, n=59 



(d) Low SNR, Fast adds, n=59 



Fig. 4. Lower SNRs and Faster additions. LS-CS-no-deletion refers 
to LS-CS without deletion step, y axis is log scale in Fig. |4(a)|4(c)| 

8, r = 2, M = 3. Half the a/s were 0.5, the other half 
were 0.25. We used n = 59 and noise was unif(~c,c) with 
c = 0.0528. The LS-CS algorithm used A = c||A||i = 0.35, 
a = c and a del = 2.28c. We assumed that the initialization 
condition holds, i.e. we started LS-CS with No = No. In 
all 100 simulations, the number of misses and extras became 
exactly zero within do + S a — 1 = 4 time units of the addition 
time, i.e. the LS-CS estimate became equal to that of the genie- 
LS. Thus do was at most 3 in the simulations. The NMSE of 
LS-CS is < 0.4% while that of CS with small A, is 30-40%. 



So = 20, S a = 2 and the noise is unif(—c, c). Also, we used 
a smaller A, A = c||A||i/2 since it encourages more additions. 

We define two quantities: minimum average signal to noise 
ratio (min-SNR) and maximum average signal to noise ra- 
tio (max-SNR). Min (max) SNR is the ratio of minimum 
(maximum) average signal magnitude to the noise standard 
deviation. For unif(—c, c) noise, the standard deviation is 
c/v3. Min-SNR, which occurs right after a new addition, 
decides how quickly new additions start getting detected 
(decides do). Max-SNR decides whether |A| becomes zero 
before the next addition. Both also depend on n of course. 

For the previous subsection (Fig. [3]), c/ V3 = 0.03. Mini- 
mum average signal magnitude was (0.5 + 0.25)/2 = 0.375 
while maximum was (M + rfmirtj a,i)/2 = (3 + 8 * 0.25)/2 = 
2.5. Thus min-SNR was 12.3 while max-SNR was 82. 

In slow-adds (Fig. [4(a)l [4(b)) , we use n = 59, c = 0.1266 
and Signal Model Q] with ai = 0.2, M = 1, d = 8 and r = 3. 
Thus min-SNR was 0.2 * V^/0.1266 = 2.73 while max-SNR 
was 1 * V3/0.1266 = 13.7 (both are much smaller than 12.3 
and 82 respectively). LS-CS used A = 0.176, a = c/2 = 
0.06 = ctdei- Also, it restricted maximum number of additions 
at a time to 5 a +l. We also evaluated our assumption that CS at 
t = done with large enough no finds the support without any 
error. With no = 150, this was true 90% of the times, while 
in other cases there were 1-2 errors. Notice the following. 
(1) Most additions get detected within 2 time units and there 
are occasionally a few extra additions. This is because we 
set a = ctdei — c/2 (both very low). (2) As long as A' t At 
remains well-conditioned, a few extras do not increase the 
error visibly above that of the genie-LS. Notice from the plots 
that even when LS-CS « genie-LS, the average extras, |A e |, 
are not zero. (3) LS-CS error (NMSE) is stable at 2.5% while 
the CS errors are much larger at 40-60%. 

In fast-adds (Fig. [4(c)) [4(d)| , we use n = 59, c = 0.0528 
and a slightly modified Signal Model Q] with a, = 0.2, M = 1, 
d = 3 and r = 2. Thus min SNR was 0.2 * V3/0.0528 = 6.6 
while max SNR was 0.6*^3/0.0528 = 19.7. Both are smaller 
than the stability simulation, but larger than the slow-adds 
simulations. This was needed because in this case the delay 
between addition times was only 3, and so quick detection 
was needed to ensure error stability. LS-CS used A = 0.176, 
a = c = 0.05 = ctdei an d maximum additions per unit time 
of S a = 2. LS-CS error (NMSE) is still stable at 1%. 



C. Lower SNR and faster additions 

Next we ran two sets of simulations with much lower SNRs 
- slow-adds and fast-adds. Slow-adds used d = 8, while fast- 
adds had faster additions, d = 3. In all simulations, m = 200, 



D. Dynamic MRI reconstruction example 

To address a reviewer comment, in Fig. [3J we show the 
applicability of LS-CS to accurately reconstruct a sparsified 
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(b) Frames 2, 11, 20: original and reconstructed 
Fig. 5. Dynamic MRI. Reconstructing a sparsified cardiac sequence. 

cardiac image sequence from only 35% (simulated) MRI 
measurements. Detailed comparisons for actual (not sparsi- 
fied) image sequences, using practical MR data acquisition 
schemes, and with using BPDN are given in |9). 

For Fig. [3] the sparsity basis was the two-level Daubechies-4 
2D DWT. Images were 32x32 (to = 1024) and were sparsified 
by retaining the largest magnitude DWT coefficients that make 
up 99.5% of the total image energy and computing the inverse 
DWT. The support size of the sparsified DWT vector varied 
between 106-1 10, and the number of additions to (or removals 
from) the support from any t — 1 to t varied between 1-3. 
Denote the ID DWT matrix by W and the DFT matrix by 
F. Then $ = W <g> W and the measurement matrix, H = 
M rs (F®F)/32 where M rs is an n x m random row selection 
matrix and <E) denotes the Kronecker product. We used n = 
0.35m and no = 0.8m. Noise was zero mean i.i.d. Gaussian 
with variance a 2 = 0.125. Both LS-CS and CS used A = 1.5(7. 
We also tried running CS with smaller values of A: A = 0.15(7 
and A = 0.3er, but these resulted in (0]i being infeasible. 

VI. Conclusions 

We formulated the problem of recursive reconstruction of 
sparse signal sequences from noisy observations as one of 
noisy CS with partly known support (the support estimate from 
the previous time serves as the "known" part). Our proposed 
solution, LS CS-residual (LS-CS), replaces CS on the raw 
observation by CS on the LS residual, computed using the 
known part of the support. We obtained bounds on CS-residual 
error. When the number of available measurements, n, is small, 
we showed that our bound is much smaller than the CS error 
bound if |A|,|A e | are small enough. We used this bound 
to show the stability of LS-CS over time. By "stability" we 
mean that |A|, |A e | remain bounded by time-invariant values. 
Extensive simulation results backing our claims are shown. 

An open question is how to prove stability for a stochastic 
signal model that uses a random walk model with drift 
given by the current model for coefficient increase/decrease 
while using a (statistically) stationary model for "constant" 
coefficients, and that assumes a prior on support change, e.g. 
a modification of the model of l25l . Finally, in this work, we 



did not study exact reconstruction using much fewer noise-free 
measurements. We do this in 



Appendix 

A. CS-residual Bound: Proof of Lemma Q] and Theorem Q] 

1 ) Proof of Lemma \J} The proof is a modification of the 
proof of Theorem 1.3 given in (6). Let 5 = 62s, 9 = 9s,2S- 
Let C = C + h. Let T C T nz be a size 5" subset with S < 



min(SU, S nz ) and let T rest = T nz \T . \\w\\ 



< 



implies 



that \A/w\ < HAjIIiIHIoo < A. Thus eq. (3.1) of |6] holds 
with probability (w.p.) 1 and so £ is feasible. Thus, 



\\(h) T§ \ 
\A'Ah\\ ( 



1 < \\(h) To 

n < 2A 



2||(C). 



(24) 
(25) 



The second equation is eq (3.3) of J6J. The first follows by 



simplifying 



< 



iciii m. 



Recall that S*** is the largest value of S for which 5+9 < 1. 
Thus we can apply Lemma 3.1 of [6| for any S < 5**. Let 
T\ contain the indices of the S largest magnitude elements of 
h := C-C outside of T . Let T al := T UTi. Thus |T i| = 25 
and ||/iT ||fe < 11^-TbilU for any £k norm. Apply Lemma 3.1 of 
[|6l and use (2M and d25l l to upper bound its first inequality. 
Then use ||/i To ||i/\/5 < \\h To \\ < \\h T(n \\ to simplify the 
resulting inequality, and then use (a + 6) 2 < 2a 2 + 26 2 to 
square it. Finally, use ||(C)r °||i = II(C)t„J|i to get 



l T 01 



< 



165A 2 



? 2 IICtJ| 2 



{1-5 -9) 2 (1-5 



(26) 



Using (a + b) 2 < 2a 2 + 2b 2 to simplify the square of d24l) : 
using the resulting bound in the second inequality of Lemma 
3.1 of 10; and then finally using (1261) . we get 



< 



485A 2 



(1-5 -6f 



(8 + 24- 



1 



IICtJI 2 
S 



(27) 



Since = T„z \ T and |T | - 5, thus |T rest | - S nz - S. 
Thus IICr.ll? < (5„,-5)||Ct, c J| 2 . This gives ourresult which 
holds for any set To C T nz of size S < min(S'**, S nz ). ■ 

2) Proof of Theorem [7} The result follows by applying 
Lemma Q] with £ = ft, S nz = |T| + |A| and picking the 
set T rest of size |T| + |A| - S as follows. For S > |A|, pick 
T rest C T of size |T| + |A| - S and bound ||/3 T J| by \\f3 T \\. 
Use dT4b to bound \\I3t\\, and use <5| T | < 1/2 to simplify the 
final expression. For S < |A|, pick the set T rest as the set 
T union with |A| — S smallest elements of xa- Finally use 

^CSres = P + iinit and (3 = X - Xi n i t tO get /3 - /3 = X - icSres- 

Lastly, from the definitions, |T| = |7V| + |A e | - |A|. 

B. LS-CS Stability: Proofs of the Key Lemmas for Theorem^ 

The proofs of the three lemmas essentially follow from 
Corollary [T] and the following simple facts. 

1) An i e A (an undetected element) will definitely get 
detected at current time if x\ > 2a 2 + 2||x — £csres|| 2 

4 An i £ A will get detected if | (^cSres)i I > a - Since |(^csres)i| ^ \ x i\ ~ 
\%i -(^csrcjil > Nil - ll^-^csresll, this holds if \xi\> a+\\x-x CSres \\. 
This, in turn, holds if > la 2 + 2||a; — xcsrcs|| 2 - 
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2) An i £ (Tdet \ A e det) (a nonzero element of the current 
detected set) will definitely not get falsely deleted at the 
current time if xj > 2a\ el + 2\\(x — Xdet)-f dc , l| 2 - 

3) All i £ A e d et (a zero element of the current detected 
set) will get deleted if a 2 del > \\(x — Xdet)f . II 2 - 

4) If HHloo < A/||A|| a and |T det | < S„ 'then \\(x - 

idet)f d JI 2 < (4nA 2 /l|A|| 2 ) + se ltial | Adc ,| 2 lkA d JI 2 < 

(4nA 2 /P|| 2 ) + 80 | ^ MAdc , | 2 |A det |||x Ad J| 2 o 

5) The bound in fact[4]is non-decreasing in |T det | and |Ad e t|- 
Proof of Lemma [2] From Corollary [TJ and the fact that 

II^aII 2 < IAKxa) 2 ^ if |HU < A/PHi, \T\ < S, and 
|A| < S„, then \\x - i CSres || 2 < C + C" 6 2 \A\(x A ) 2 {11 with 
C, C", 9 computed at \T\, \A\. C, C" are defined in 

Using fact [T] from above, the largest undetected element, 
(%a)(i), will definitely get detected at the current time if 
(za) 2 !) > 2a 2 + 2C" + 2C"6» 2 |A|(x A ) 2 1) . Clearly this holds if 

29 2 \A\C" < 1 and ^fgt^C" < (xa) 2 {1) - If it is only known 
that |T| < St and |A| < Sa then our conclusion will hold if 
the maximum of the left hand sides (LHS) over \T\ < St and 
|A| < Sa is less than the right side. This gives the lemma. 
The LHS of the first inequality is non-decreasing in \T\, \A\ 
and hence is maximized for St, Sa- The LHS of the second 
one is non-decreasing in \T\ but is not monotonic in |A|. 
Proof of Lemma [3} It follows from facts [2] |4] and [5J 
Proof of Lemma [?] It follows from facts [3] E] and [5J 

C. LS-CS Stability: Proof of Theorem^ 

Let to = (call it the zeroth addition time). The first 
addition time, t\ = 1. We prove Theorem [2] by induction. 
At t — to — 0, all the So — S a coefficients are correctly 
detected (according to the initialization condition), and thus 
|A| = |A e | = and |T| = \N\ =S - S a . Thus for the initial 
interval t £ [to, t\ — 1], our result holds. This proves the base 
case. Now for the induction step, assume that 

Assumption 2 (induction step assumption): The result 
holds for all t £ [tj-\,tj — 1]. Thus at t = tj — 1, 
|A| = |A e | = and \f | = \N\ = S - S a . 
Then prove that the result holds for t £ [tj,tj+i — 1]. The 
following facts will be frequently used in the proof. 

1) Recall that tj + i = tj + d. Also, coefficient decrease 
of the elements of 1Z begins at tj+i — r = tj + d — r 
and the coefficients get removed at tj+i — 1. Since d > 
do + Sa+r (condition|7]of the theorem), thus, coefficient 
decrease does not begin until tj + do + S a or later. 

2) At all t £ [tj,t j+ x-2], \N\ = So, while at t = t 3+1 - 1, 
\N\ = So — S a . Also, there are S a additions at t = tj 
and none in the rest of the interval [tj,tj + i — 1]. There 
are S a removals at t = tj+i — 1, and none in the rest of 
the interval before that. 

3) A t C At-x U (N t \ N t -i) and A e , t C A M _i U 
(Nt-i \Nt). If there are no new additions, A t = Af_i. 
Similarly, if there are no new removals, A e t = A e ,t-i. 

The induction step proof follows by combining the results 
of the following six claims. In each claim, we bound |A|, |A e |, 
\T\ in one of the sub-intervals shown in Fig.|2] Using the last 
two facts above, the bounds for |A|, |A e |, \T\ follow directly. 



Claim 1: At all t = tj + i, for all i = 0, 1, . . . do — 1, 
|A| < S a , \A e \ <(i + 1)/, \f I < So + (i + 1)/. 

Proof: We prove this by induction. Consider the base case, 
t = tj. At this time there are S a new additions and |iV| = 
So- Using Assumption 12 (induction step assumption), |A| = 
S a , \A e \ = 0. In the detection step, |A det | < |A| = S a and 
so H^AdJI 00 — min(Af, (a^i)(i))- There are at most / false 
detects (condition 0I, so that |A e ,det| < + /. Thus, |Td et | = 
|^V| + |A e , det |-|A det |<6'o + /.' 

The smallest constant coefficient has magnitude 
min(M, dmini Oj). Apply Lemma [3] with St = So + f, 
Sa = S a , b\ = min(M, dminj aj). It is applicable since 
conditions [3a] and [5] hold. Thus none of the constant 
coefficients will get falsely deleted and so |A| < S a . Also, 
clearly |A e j < |A e , de ,| < /. Thus \ f\ < S Q + f- 

For the induction step, assume that the result holds for 
tj+i — 1. Thus, at t = tj + i, |A e . t | = |A e . t _i| < if and 
| At | = |Af_i| < S a - Using condition [2] after the detection 
step, |A e , det | < (i + 1)/. Thus, |f det | < So + (i + l)f- Also, 
|A det | < S a and so ||a: Adet ||oo < min(M, (i + l)(a^) (1) ). 
Applying Lemma |3] with St = S + (i + 1)/, Sa = S a , 
bi = min ( M, d mini cii) (applicable since conditions l3al 151 
hold), none of the constant coefficients will get falsely deleted. 
Thus, |A| < S a - Also, clearly |A e | < |A e det | < Thus 
|f|<5 + (i + l)/. 

Claim 2: At t = tj + do + i — 1, for all i = 1, . . . S a , 
\A\ <S a -t, \A e \ < (do + i)/, |f| < S + (do + i)f, and the 
first i largest increasing coefficients are definitely detected. 

Proof: We prove this by induction. Consider the base case 
t = tj +do- Using the previous claim, |A| < S a , |A e | < dof, 
\T\ < So + dof. At this time, either the largest element of A, 
which has magnitude min(M, (do + l)(aX)(i)), has already 
been detected so that the number of undetected elements 
already satisfies |A| < S a — 1 or it has not been detected. 
If it has been detected, then |A det | < |A| < S a - 1. If it has 
not been detected, then (xa)(i) = min(M, (do + l)(a_A)^). 
Apply Lemma [2] with 5a = S a , St — So + dof. It is 
applicable since conditions [3a] [3b] hold and condition |4a] 
holds for i = l. Thus the largest element will definitely 
get detected. Thus, in all cases, |Ad e t| < S a — 1 and so 
IkAdJU < min(M, (d + l)(a^) (2) ). Using condition [2] 
|A e>de t| < (do + 1)/ and so |f det | < S Q + (d + 1)/. 

Applying Lemma [3] with St = S + (do + l)f, Sa = S a — 1, 
bi = min(M, (do + l)(a^i)( 1 ) (applicable since condition l3al 
holds and l4b] holds for i = 1), the largest increasing coefficient 
will not get falsely deleted. Further, applying Lemma [3] with 
bi = min(M, drain.; a^) (applicable since conditions l3al and 151 
hold), none of the constant coefficients will get falsely deleted. 
Thus, |A| < S a - 1. Also |A e | < |A e det | < (d + 1)/ and so 
|T| < So + (d + l)f. 

For the induction step, assume that the result holds for 
tj + do + i- 2. Thus, at t = tj + do + i - 1, | A| < S a - i + 1, 
|A e | < (d + i - l).f, \T\ < So + (do + i- l)f and the first 
i — 1 largest elements have already definitely been detected. 
Either the i th largest element has also been already detected, 
in which case |A| < S a — i or it has not been detected. 
If it has, then |Ad e t| < |A| < S a — i- If it has not been 
detected, then (xa)(i) = min(M, (do +i)(a^)^). As before, 
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use conditions [3a] [3b] and [4a] and apply Lemma [2] to claim 
that the i th largest element will definitely get detected. Thus, 
in all cases, |A det | < S a — i and so ||oo < min(M, (d a + 

Using condition [2] |A e)det | < (d + i)f and 
so |T det | < So + (do + i)f. Also as before, apply Lemma 
[3] first with bi = min(M, (do + i)(a>A,)(i)) and then with 
bi = min ( M, d min^ ai) (applicable since conditions [3al l4bl 
[5] hold) to claim that all constant coefficients and all the i 
largest increasing coefficients will not get falsely deleted. Thus 
|A| < S a -i. Also, |A e | < (do+i)/ and |T| < S + (d Q + i)f. 
Claim 3: At * = tj + do + S a - 1, |A e | = 0. 
Proof: In the previous proof we have shown that at t = tj + 
d Q + S a - 1, i.e. for i = S a , | A det | = and |T det | < S + (d + 
S a )f. Apply Lemma E] with 5 A = 0, S T = S + (d + S a )f 
(applicable since conditions l3al l2l hold). Thus all false detects 
will get deleted, i.e. |A e | =0. 

Claim 4: At all t £ [tj+do + S a -l,t j+ i-r-l], |A| = 0, 
|A e | = 0. Thus f = N t and \f\ = \N t \ = So- 
Proof: Using the previous two claims, the result holds for 
t = tj+do + S a — l (base case). For the induction step, assume 
that it holds for tj +do +S a +i— 1 . Thus, at t = tj+do + S a +i, 
|A| = 0, |A e | = and \T\ = S . Since |A det | < |A|, |A det | = 
and thus H&2L II °° = 0. Using condition [2] |A e ,det| < + / 
and thus |T det | < So + /• Use conditions [3a] gb] (for i = S a ) 
and [5] to first apply Lemma [3] with St = So + f, Sa = 0, 
bi = min(M, (do + S a + i + l)(a,A)(s a ) (smallest increasing 
coefficient) and then with 61 = min(M, dmirij a,) (smallest 
constant coefficient) to show that there are no false deletions 
of either constant or increasing coefficients. Thus |A| = 0. 
Use conditions [3a] and [2] and apply Lemma [4] with 5a = 0, to 
show that |A e | =0. 

Claim 5: At t G [t j+1 - r,t j+1 - 1], |A| = 0, |A e | = 0. 
Thus f = N t and |T| = |iV t | = So- 
Proof: The proof again follows by induction and arguments 
similar to those of the previous claim. The only difference 
is the following. At any t = tj+i — r + i — 1, one applies 
Lemma [3] three times: the first two times for increasing and 
constant coefficients (as before) and then a third time with 
S T = So + f, S A = 0, b x = ((* - l)/r) min(M, d(a n ) {Sa) ) 
(for the current smallest decreasing coefficient). This last one 
is applicable since conditions [3a] and [6] hold. 

Claim 6: At t = t J+1 -l, |A| = 0, |A e | = 0. Thus f = N t 
and |f| = |JV t | = S , -5„. 

The only difference at this time is that the decreasing coeffi- 
cients get removed. As a result, |iVt| = So — S a , |A e | = S a 
and |A e ,det| = S a + f- But |A det | = |A| = 0. As before, using 
conditions [3a] and [2] and applying Lemma [4] with 5a = 0, all 
extras will still get removed and so still |A e | =0. Everything 
else is the same as before. 

D. LS-CS Stability: Proof of Corollary\3\ 

We have shown that \f\ < 5 + f(d + S a ) and |A| < 
S a . We can bound ||x^|| as follows. In the first sub-interval, 
|A| < S a and the maximum value of any element of A t at 
any t in this interval is min(M, do{o>A(i))(i)) so that ||x^|| 2 < 
5 a min(M, do(a^(j))(i)) 2 - In the second sub-interval, at t = 



tj + do + i — 1, |A| < S a — i and ||x A ||oo < min(M, (do + 
i)(a>A(j))(i+i))- In the last two sub-intervals, |A| = 0. Thus, 

||x A || 2 < max jnax (S a - i)min(M, (d + i)(a.A(]))(i+i)) 2 

j i—0,...S a 

< S a min(M, (d + S a ) max a t ) 2 (28) 

i 

This gives the LS-CS error bound. In a similar fashion, we can 
argue that ||xa|| 2 < S a min(M, (do + S a ) max^ ai) 2 . Using 
this in Corollary Q] gives the CS-residual error bound. 
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